Multi-language Machine Translation through Interactive Document Normalization
نویسنده
چکیده
Document normalization is an interactive process that transforms raw legacy documents into semantically well-formed and linguistically controlled documents with the same communicative intention content. A paradigm for content analysis has been implemented to select candidate semantic representations of the communicative content of an input document. This implementation reuses the formal content specification of a multilingual controlled authoring system. As a consequence, a candidate semantic representation can not only be associated with a text in the language of the input document, but also in all the languages supported by the system. This paper presents how multilingual versions of an input legacy document can be obtained interactively with a proposed implementation, and discusses the advantages and limitations of this kind of normalizing translation.
منابع مشابه
A new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Graph-based Approach to Cross-language Multi-document Summarization
Cross-language summarization is the task of generating a summary in a language different from the language of the source documents. In this paper, we propose a graph-based approach to multi-document summarization that integrates machine translation quality scores in the sentence extraction process. We evaluate our method on a manually translated subset of the DUC 2004 evaluation campaign. Resul...
متن کاملThe Effect of Metapragmatic Awareness, Interactive Translation, and Discussion through Video-Enhanced Input on EFL Learners’ Comprehension of Implicature
It is substantiated that particular features of pragmatics are teachable, and instruction is both necessary and effective. Determining what kind of intervention is most effectual for facilitating learners’ pragmatic development has been a central issue for researchers. To respond to the inconclusive findings in intervention studies and to extend the instructional studies in L2 pragmatics to les...
متن کاملText normalization based on statistical machine translation and internet user support
In this paper, we describe and compare systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of internet users. Internet users normalize text displayed in a web interface, thereby providing a parallel corpus of normalized and nonnormalized text. With this corpus, SMT models are generated to translate non-normalized into norm...
متن کاملNormalizing Medieval German Texts: from rules to deep learning
The application of NLP tools to historical texts is complicated by a high level of spelling variation. Different methods of historical text normalization have been proposed. In this comparative evaluation I test the following three approaches to text canonicalization on historical German texts from 15th–16th centuries: rule-based, statistical machine translation, and neural machine translation....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003